SPHier: Scalable Parallel Biclustering Using Weighted Bigraph Crossing Minimization

نویسندگان

  • Waseem Ahmad
  • Ashfaq Khokhar
چکیده

Biclustering is used for discovering correlations among subsets of attributes with subsets of transactions in a transaction database. It has an extensive set of applications ranging from Gene co-regulation analysis[4], documentkeyword clustering[12] and collaborative filtering for online recommendation systems[13]. In this paper, we propose optimal biclustering problem as maximal crossing number reduction in a weighted bipartite graph. Based on the problem formulation, we then present SPHier, a novel parallel biclustering algorithm based on weighted bigraph crossing minimization problem. Crossing minimization has been extensively used in Graph Drawing and VLSI Circuit Layouts for reducing wire congestion while its application to scalable parallel biclustering problem, to the best of our knowledge, is being investigated for the first time in this paper. We show that crossing minimization approach provides a simple and intuitive method to identify bi-clusters. Moreover, it is much easier to parallelize with excellent speedup characteristics. We have validated SPHier on synthetic and biological data sets. We show performance results on an AMD Athlon based 32-node Linux Cluster.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

BiCross : A Biclustering Technique for Gene Expression Data using One Layer Fixed Weighted Bipartite Graph Crossing Minimization

Biclustering has become an important data mining technique for microarray gene expression analysis and profiling, as it provides a local view of the hidden relationships in data, unlike a global view provided by conventional clustering techniques. This technique, in contrast to the conventional clustering techniques, helps in identifying a subset of the genes and a subset of the experimental co...

متن کامل

BiFree: An Efficient Biclustering Technique for Gene Expression Data Using Two Layer Free Weighted Bipartite Graph Crossing Minimization

Conventional clustering technique for gene expression data provides a global view of the data. In the biological prospective, a local view is essential for better analysis of gene expression data with simultaneous grouping of genes and conditions. Several biclustering techniques have been proposed in the literature based on different problem formulation. Therefore, it is difficult to compare th...

متن کامل

High Performance Parallel/Distributed Biclustering Using Barycenter Heuristic

Biclustering refers to simultaneous clustering of objects and their features. Use of biclustering is gaining momentum in areas such as text mining, gene expression analysis and collaborative filtering. Due to requirements for high performance in large scale data processing applications such as Collaborative filtering in E-commerce systems and large scale genome-wide gene expression analysis in ...

متن کامل

Evaluating iterative improvement heuristics for bigraph crossing minimization

The bigraph crossing problem, embedding the two node sets of a bipartite graph G = (V0; V1; E) along two parallel lines so that edge crossings are minimized, has application to placement optimization for standard cells and other technologies. Iterative improvement heuristics involve repeated application of some transformation on an existing feasible solution to obtain better feasible solutions....

متن کامل

cHawk: An Efficient Biclustering Algorithm based on Bipartite Graph Crossing Minimization

Biclustering is a very useful data mining technique for gene expression analysis and profiling. It helps identify patterns where different genes are co-related based on a subset of conditions. Bipartite Spectral partitioning is a powerful technique to achieve biclustering but its computation complexity is prohibitive for applications dealing with large input data. We provide a connection betwee...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006